| Frame | Time | Anger | Contempt | Disgust | Fear | Joy | Sad | Surprise | Neutral | ID |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0000 | 0.0101 | 0.0218 | 0.0043 | 0.0541 | 0.5260 | 0.0959 | 0.0010 | 0.2868 | T001-001 |
| 1 | 0.0333 | 0.0101 | 0.0218 | 0.0043 | 0.0541 | 0.5260 | 0.0959 | 0.0010 | 0.2868 | T001-001 |
| 2 | 0.0667 | 0.0101 | 0.0218 | 0.0043 | 0.0541 | 0.5260 | 0.0959 | 0.0010 | 0.2868 | T001-001 |
| 3 | 0.1000 | 0.0080 | 0.0187 | 0.0032 | 0.0375 | 0.5353 | 0.1050 | 0.0011 | 0.2911 | T001-001 |
| 4 | 0.1333 | 0.0091 | 0.0380 | 0.0158 | 0.0036 | 0.6902 | 0.0177 | 0.0004 | 0.2252 | T001-001 |
| 5 | 0.1667 | 0.0104 | 0.0450 | 0.0139 | 0.0030 | 0.7157 | 0.0162 | 0.0003 | 0.1955 | T001-001 |
| Start | End | Event.Switch | Event.Type | Event | ID |
|---|---|---|---|---|---|
| 86.5 | 246.50 | 1 | 1 | Analytical Questions | T001-005 |
| 508.5 | 657.50 | 1 | 2 | Mathematical Questions | T001-005 |
| 107.5 | 269.25 | 1 | 3 | Emotional Questions | T001-006 |
| 521.0 | 674.75 | 1 | 3 | Emotional Questions | T001-006 |
| 81.0 | 240.00 | 1 | 4 | Texting | T001-007 |
| 510.0 | 671.00 | 1 | 4 | Texting | T001-007 |
Sample of Cleaned Data Showing an Event Transition
| Subject | Trial | Age | Gender | Frame | Time | Event.Switch | Event | Action | Anger | Contempt | Disgust | Fear | Joy | Sad | Surprise | Neutral | Texting |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| T001 | 007 | Y | M | 2427 | 80.900 | 0 | No Event | 0 | 0.0909 | 0.0575 | 0.4205 | 3e-04 | 0.0011 | 0.1343 | 0 | 0.2954 | 0 |
| T001 | 007 | Y | M | 2428 | 80.933 | 0 | No Event | 0 | 0.0612 | 0.0397 | 0.4293 | 4e-04 | 0.0011 | 0.1630 | 0 | 0.3052 | 0 |
| T001 | 007 | Y | M | 2429 | 80.967 | 0 | No Event | 0 | 0.1034 | 0.0963 | 0.3186 | 2e-04 | 0.0013 | 0.0856 | 0 | 0.3946 | 0 |
| T001 | 007 | Y | M | 2430 | 81.000 | 1 | Texting | 4 | 0.0363 | 0.4976 | 0.0171 | 1e-04 | 0.0024 | 0.0069 | 0 | 0.4396 | 1 |
| T001 | 007 | Y | M | 2431 | 81.033 | 1 | Texting | 4 | 0.0059 | 0.7285 | 0.0027 | 4e-04 | 0.0068 | 0.0063 | 0 | 0.2493 | 1 |
| T001 | 007 | Y | M | 2432 | 81.067 | 1 | Texting | 4 | 0.0058 | 0.6890 | 0.0035 | 4e-04 | 0.0077 | 0.0068 | 0 | 0.2868 | 1 |
Reproducible Research
Takeaways
Differences in variation between the trials suggest that it may be possible to build a model capable of predicting a texting event
Subject specific plots are unique enough that a individual subjects variables may be needed in modeling
Baseline Trial: Trial 4 was used as a baseline trial because the conditions were identical to the Texting Trial (dense traffic with detour)
Model Proposal:
Feed-Forward Neural Networks
Neural Network Components
Step 1: Model is Initialized with Random Weights
Step 2: Calculate Hidden Weights and Output Node Prediction
Step 3: Update Weights Based on Error
Step 4: Repeat steps 2-3 to update node values
General Model Form
\[ \begin{align*} nnet(Texting = & \text{ } Subject + Age + Gender + Anger + Contempt \text{ } + \\ & \text{ } Digust + Fear + Joy + Sad + Surprise + Neutral)\\ \end{align*} \]
Modeling Strategy
Train the same general model on various slices of the data to see what works best
12 total training/testing data sets created from the combination of Data Processing and Data Split methods
Data Processing
Data Split
Statistical Software
R's nnet package for feed-forward neural networks
The Caret Package
Performance and Validation Testing
Model Search Parameters
Model Performance with 100 Iteration Limit
| Model | Data Processing | Data Split | MaxItr | Size | Decay | Training | Testing | AUC |
|---|---|---|---|---|---|---|---|---|
| Model 1: | Original | 365 Split | 100 | 50 | .20 | .760 | .676 | .734 |
| Model 2: | Original | Entire Sim | 100 | 50 | .20 | .754 | .754 | .847 |
| Model 3: | Differencing | 365 Split | 100 | 10 | .00 | .518 | .516 | .526 |
| Model 4: | Differencing | Entire Sim | 100 | 25 | .10 | .572 | .571 | .637 |
| Model 5: | Moving Avg | 365 Split | 100 | 10 | .00 | .503 | .502 | .527 |
| Model 6: | Moving Avg | Entire Sim | 100 | 10 | .00 | .528 | .528 | .544 |
| Model 7: | ½ Sec Cut | 365 Split | 100 | 50 | .10 | .820 | .698 | .761 |
| Model 8: | ½ Sec Cut | Entire Sim | 100 | 50 | .20 | .788 | .779 | .868 |
| Model 9: | ½ Sec Diff | 365 Split | 100 | 50 | .10 | .633 | .602 | .650 |
| Model 10: | ½ Sec Diff | Entire Sim | 100 | 50 | .20 | .682 | .622 | .681 |
| Model 11: | ½ Sec Cut Stat | 365 Split | 100 | 50 | .10 | .846 | .716 | .781 |
| Model 12: | ½ Sec Cut Stat | Entire Sim | 100 | 50 | .20 | .820 | .803 | .891 |
Additional Training for Best Models
| Model | Data Processing | Data Split | MaxItr | Size | Decay | Training | Testing | AUC |
|---|---|---|---|---|---|---|---|---|
| Model 8: | ½ Sec Cut | Entire Sim | 250 | 50 | .10 | .816 | .804 | .893 |
| Model 8: | ½ Sec Cut | Entire Sim | 500 | 50 | .10 | .828 | .810 | .899 |
| Model 8: | ½ Sec Cut | Entire Sim | 1000 | 50 | .10 | .842 | .820 | .906 |
| Model 12: | ½ Sec Cut Stat | Entire Sim | 250 | 50 | .10 | .858 | .823 | .906 |
| Model 12: | ½ Sec Cut Stat | Entire Sim | 500 | 50 | .20 | .864 | .823 | .907 |
| Model 12: | ½ Sec Cut Stat | Entire Sim | 1000 | 50 | .10 | .871 | .824 | .908 |
## Set Cross Validation
fit.control = trainControl(method = "cv", number = 10)
## Create combination of model parameters to train on
search.grid = expand.grid(decay = c(0, .1, .2),
size = c(1, 10, 25, 50))
## Limit the iterations and weights each model can run
maxIt = 1000; maxWt = 15000
fit = train(Texting ~ . - Time, mdl.08.train,
method = "nnet",
trControl = fit.control,
tuneGrid = search.grid,
MaxNWts = maxWt,
maxit = maxIt)
44503 samples, 12 predictors, 2 classes: '0', '1'
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 40053, 40053, 40052, 40052, ...
Resampling results across tuning parameters:
------------------------------
Decay Size Accuracy Kappa
------------------------------
0.0 1 0.6654 0.3042
0.0 10 0.7857 0.5519
0.0 25 0.8135 0.6129
0.0 50 0.8252 0.6375
0.1 1 0.6830 0.3182
0.1 10 0.8052 0.5934
0.1 25 0.8247 0.6352
0.1 50 0.8304 0.6472 ## Best Model
0.2 1 0.6809 0.3126
0.2 10 0.8033 0.5889
0.2 25 0.8196 0.6242
0.2 50 0.8241 0.6336
Reference
Prediction 0 1
0 22736 4616
1 2943 14208
Accuracy : 0.8301
95% CI : (0.8266, 0.8336)
No Information Rate : 0.577
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6479
Mcnemars Test P-Value : < 2.2e-16
Sensitivity : 0.8854
Specificity : 0.7548
Pos Pred Value : 0.8312
Neg Pred Value : 0.8284
Balanced Accuracy : 0.8201
Area Under Curve (AUC): 0.906
Total Accuracy by Subject
| T022 | T086 | T007 | T006 | T018 | T035 | T083 | T076 | T081 | T064 | T020 | T012 | T074 | T009 | T013 | T088 | T003 | T032 | T011 | T044 | TOP 20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Train | 0.981 | 0.960 | 0.919 | 0.943 | 0.940 | 0.956 | 0.956 | 0.949 | 0.929 | 0.922 | 0.931 | 0.928 | 0.925 | 0.914 | 0.907 | 0.937 | 0.907 | 0.915 | 0.916 | .915 | .932 |
| Test | 0.971 | 0.952 | 0.948 | 0.942 | 0.937 | 0.936 | 0.932 | 0.927 | 0.923 | 0.919 | 0.918 | 0.913 | 0.909 | 0.905 | 0.903 | 0.896 | 0.896 | 0.895 | 0.881 | .880 | .919 |
| GenderMale | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 |
| AgeOld | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 7 |
| T080 | T016 | T005 | T060 | T039 | T015 | T008 | T046 | T029 | T079 | T051 | T073 | T082 | T024 | T010 | T001 | T066 | T017 | T033 | T042 | MID 20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Train | 0.897 | 0.904 | 0.867 | 0.911 | 0.880 | 0.868 | 0.879 | 0.883 | 0.842 | 0.892 | 0.884 | 0.855 | 0.866 | 0.829 | 0.847 | 0.867 | 0.855 | 0.824 | 0.825 | 0.843 | .865 |
| Test | 0.872 | 0.871 | 0.864 | 0.859 | 0.853 | 0.850 | 0.848 | 0.847 | 0.839 | 0.837 | 0.832 | 0.831 | 0.830 | 0.827 | 0.826 | 0.825 | 0.819 | 0.817 | 0.803 | 0.802 | .837 |
| GenderMale | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 8 |
| AgeOld | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 7 |
| T031 | T040 | T061 | T036 | T047 | T084 | T077 | T014 | T004 | T021 | T019 | T002 | T054 | T025 | T041 | T034 | T023 | T038 | T027 | BOTTOM 19 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Train | 0.846 | 0.814 | 0.796 | 0.800 | 0.789 | 0.803 | 0.792 | 0.828 | 0.771 | 0.812 | 0.746 | 0.742 | 0.774 | 0.760 | 0.719 | 0.704 | 0.711 | 0.674 | 0.651 | .764 |
| Test | 0.794 | 0.790 | 0.787 | 0.783 | 0.782 | 0.776 | 0.766 | 0.758 | 0.758 | 0.757 | 0.742 | 0.735 | 0.731 | 0.724 | 0.720 | 0.700 | 0.682 | 0.665 | 0.640 | .741 |
| GenderMale | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 10 |
| AgeOld | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 12 |
Evaluating Differences in Age and Gender
******************************************************************
Levene's Test for Homogeneity of Variance
******************************************************************
Df F value Pr(>F)
group 3 0.3182 0.8122
55
******************************************************************
General Linear Model
******************************************************************
Deviance Residuals:
Min 1Q Median 3Q Max
-0.163277 -0.041330 -0.000279 0.059284 0.148769
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.80337 0.02261 35.534 <2e-16 ***
GenderAgeYoung Female 0.05604 0.02953 1.898 0.063 .
GenderAgeOld Male 0.02099 0.03033 0.692 0.492
GenderAgeYoung Male 0.03718 0.03033 1.226 0.226
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.006133847)
Null deviance: 0.36163 on 58 degrees of freedom
Residual deviance: 0.33736 on 55 degrees of freedom
AIC: -127.25
Number of Fisher Scoring iterations: 2
******************************************************************
Shapiro-Wilk Normality Test
******************************************************************
data: mdl$residuals
W = 0.97765, p-value = 0.3482